Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not merge - this is a concept - feat: acquisition checkpoint #216

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
"docs": "npx jsdoc2md -c .jsdoc.json --files 'src/*.js' > docs/API.md",
"semantic-release": "semantic-release",
"semantic-release-dry": "semantic-release --dry-run --branches $CI_BRANCH 1.x main",
"prepare": "husky install"
"prepare": "husky install",
"generate-bloom-filter": "node test/generateAcquisitionBloomFilters.js"
},
"repository": {
"type": "git",
Expand Down
53 changes: 50 additions & 3 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
*/
const { sampleRUM } = window.hlx.rum;

const basicHash = (string, modulo) => Array.from(string)
.map((a) => a.charCodeAt(0))
.reduce((a, b) => a + b, 1) % modulo;

const fflags = {
has: (flag) => fflags[flag].indexOf(Array.from(window.origin)
.map((a) => a.charCodeAt(0))
.reduce((a, b) => a + b, 1) % 1371) !== -1,
has: (flag) => fflags[flag].indexOf(basicHash(window.origin, 1371)) !== -1,
enabled: (flag, callback) => fflags.has(flag) && callback(),
disabled: (flag, callback) => !fflags.has(flag) && callback(),
onetrust: [543, 770, 1136],
Expand Down Expand Up @@ -269,3 +271,48 @@ fflags.enabled('email', () => {
params.filter((param) => regex.test(param)).forEach((param) => sampleRUM('email', { source: network, target: param }));
});
});

// acquisition checkpoint
(() => {
const sanitize = (str) => (str || '').toLowerCase().replace(/[^a-zA-Z0-9]/, '');
const toBinary = (s) => Array.from(s, (c) => parseInt(c, 16).toString(2).padStart(4, '0')).join('');
const moduli = [239, 241, 251]; // prime numbers smaller than 256
const knownVendors = toBinary('fbdef75ff9f4dedbfdeaba8f21e7884aebf67cfde6eefeea3b8ff32c6fb68a40'); // known vendors bloom filter
const categories = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably simplify even further with a regex approach.

affiliate: ['aff', 'affiliate', 'affiliatemarketing'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you check for string inclusion… the 1st actually covers the other 2

audio: ['spotify'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The things that people click on in Spotify are display ads. You can't click an audio clip.

brand: ['brand'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brand advertising is not an ad format, is is a type of ad. No point listing it here.

display: ['advertorial', 'banner', 'cpa', 'cpc', 'cpm', 'cpv', 'discover', 'display', 'fbads', 'goppc', 'highimpact', 'inred', 'nps', 'paid', 'paiddisplay', 'placement', 'post', 'poster', 'pp', 'ppc'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the cp* and pp* could be combined as a simple regex, and paid already covers paiddisplay (also covered by display), post covers poster, etc.

email: ['em', 'email', 'mail', 'newsletter'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

email is already covered by em and mail

local: ['yext'],
owned: ['owned'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not a category.

qr: ['qr', 'qrcode'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qrcode is covered by qr

search: ['direct', 'google', 'googleflights', 'paidsearch', 'paidsearchnb', 'sea', 'sem'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cue @ramboz comment about sea including search and so on

sms: ['sms'],
social: ['facebook', 'gnews', 'instagramfeed', 'instagramreels', 'instagramstories', 'line', 'linkedin', 'metasearch', 'organicsocialown', 'paidsocial', 'social', 'sociallinkedin', 'socialpaid'],
video: ['native', 'paidvideo', 'pvid', 'video', 'youtube'],
web: ['webapp'],
};
const sources = {
paid: ['affiliate', 'audio', 'display', 'local', 'search', 'social', 'video'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shopping is a source I see often that I miss here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see enough values to justify this. The most common I see is igshopping with 4 hosts, but that would be instagram.

Copy link
Collaborator

@ramboz ramboz Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trieloff Shopping is not an actual value you find necessarily in the params, it's a category you can derive from the hostname/vendor, same as social in a sense or search. Shopping is usually used for e-commerce sites. So think big websites like Amazon/Etsy/Alibaba/Rakuten/…, or anything based on Shopify/Magento/Squarespace/…

So I'd kinda expect in categories above, something like:

  shopping: ['alibaba', 'amazon', 'bestbuy', 'ebay', 'flipkart', 'otto', 'rakuten', 'target', 'walmart', ],

We can probably also throw in a few European ones in there, like fnac, zalando, etc. would need to see what big hits we get in RUM

owned: ['brand', 'email', 'owned', 'qr', 'sms', 'web'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push notifications could be added here as well

};
// these 'vendors' appear differently in the utmsource field. They are mapped to a single value:
const vendorMappings = [
{ regex: /newsshowcase|aci|google|googleads|gads|google-ads|google_search|google_deman|aw|adwords|dv360|gdn|doubleclick|dbm|gmb/i, result: 'google' },
{ regex: /instagram|ig/i, result: 'instagram' },
{ regex: /face|fb|meta/i, result: 'facebook' },
{ regex: /email/i, result: 'email' },
{ regex: /bing/i, result: 'bing' },
{ regex: /amazon|ctv/i, result: 'amazon' },
{ regex: /qr/i, result: 'qrcode' },
{ regex: /youtube|yt/i, result: 'youtube' },
];
const utmMedium = sanitize(new URLSearchParams(window.location.search).get('utm_medium'));
const utmSource = sanitize(new URLSearchParams(window.location.search).get('utm_source'));
Comment on lines +311 to +312
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that medium and source are used as different values that carry actual information. I don't think this is the case, and I'd just treat them as one.

const preVendor = vendorMappings.find(({ regex }) => regex.test(utmSource))?.result || utmSource;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optional chaining limits browser compatibility. I don't think we use it anywhere else in the code base.

const category = Object.keys(categories).find((key) => (categories[key] || []).includes(utmMedium)) || '';
const source = Object.keys(sources).find((key) => (sources[key] || []).includes(category)) || '';
const vendor = moduli.every((modulo) => knownVendors.charAt(basicHash(preVendor, modulo)) === '1') ? preVendor : '';
sampleRUM('acquisition', { source: `${source}:${category}:${vendor}` });
})();
138 changes: 138 additions & 0 deletions test/generateAcquisitionBloomFilters.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
/*
* Copyright 2024 Adobe. All rights reserved.
* This file is licensed to you under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License. You may obtain a copy
* of the License at http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under
* the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR REPRESENTATIONS
* OF ANY KIND, either express or implied. See the License for the specific language
* governing permissions and limitations under the License.
*/

const moduli = [239, 241, 251];

const basicHash = (string, modulo) => Array.from(string)
.map((a) => a.charCodeAt(0))
.reduce((a, b) => a + b, 1) % modulo;

const binaryToText = (binaryString) => {
return parseInt(binaryString, 2).toString(16);
};

// known vendors
const vendors = [
'adlocus',
'admitadmonetize',
'aftership',
'amazon',
'attentive',
'avivid',
'baidu',
'banner',
'bing',
'blis',
'cheetah',
'cj',
'clarin',
'clm',
'criteo',
'demandgen',
'digidip',
'digitalremedycom',
'discovery',
'display',
'eloqua',
'email',
'eminent',
'facebook',
'famoussmokeshopinc',
'fark',
'fashionistatop',
'feedotter',
'flipboard',
'flyer',
'geniusmonkey',
'giftcardmall',
'google',
'hotstar',
'hrs',
'hsemail',
'inmobicom',
'inred',
'insider',
'instagram',
'integrateddisplay',
'internal',
'line',
'linkbux',
'linkedin',
'linkinbio',
'locationpage',
'lveng',
'm2trans',
'manutd',
'marketo',
'massiva',
'mavenintent',
'mediamond',
'mentionme',
'microsoft',
'native',
'newsletter',
'nexus',
'openweb',
'optum',
'outbrain',
'outlook',
'partner',
'partnerstudentbeanscom',
'petcademy',
'pinterest',
'pmax',
'programmatic',
'programmaticgdn',
'pushly',
'qrcode',
'reddit',
'redone',
'retailercode',
'seznam',
'shopfully',
'silverpop',
'sky',
'skyscanner',
'snapchat',
'spotify',
'substack',
'taboola',
'teads',
'thetradedesk',
'tiktok',
'tradedesk',
'tradetracker',
'twitter',
'web',
'yahoo',
'yandex',
'yext',
'yieldkit',
'youtube',
];

// Initialize 256 chars long array filled with initial zeros
const bloomFilter = new Array(256).fill(0);

// Insert each vendor into the Bloom filter
vendors.forEach((vendor) => {
moduli.forEach((modulo) => {
const hash = basicHash(vendor, modulo);
bloomFilter[hash] = 1;
});
});

const f = bloomFilter.reduce((acc, _, index) => (index % 4 === 0 ? [...acc, bloomFilter.slice(index, index + 4).join('')] : acc), [])
.map((binaryChar) => binaryToText(binaryChar))
.join('');

console.log(`Bloom Filter: ${f}`);
Loading