Skip to content

core: utility function for generating eval code #10781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

connorjclark
Copy link
Collaborator

@connorjclark connorjclark commented May 14, 2020

The current interface for executing JavaScript in a page demands this of the developer:

  1. inject the necessary dependencies, including the function you want to return the value of
  2. call the function
  3. serialize the parameters of the function properly
  4. the return value is any, so be sure to do additional type checking, if you care to. or just coerce it.
const result = await passContext.driver.evaluateAsync(`(() => {
  ${calculateLogMath};
  ${runTheNumbers};

  return runTheNumbers(`${number1}`, `${number2}`);
})()`, {useIsolation: true});

Idea 1 - createEvalCode

step 3 is where most of the developer burden surfaces. it can be done in one of three ways, depending on necessity:

`...
  return callMeMaybe(`${number}`, '${string}', ${JSON.stringify(object)});
`

To address the serialization woes, I propose this interface:

const code = pageFunctions.createEvalCode(callMeMaybe, {
  args: [number1, aString, object],
  deps: [
    calculateLogMath,
  ],
}),
const result = await passContext.driver.evaluateAsync(code, {useIsolation: true});

A real-er example (from link-elements.js):

// 1) Current way.
return passContext.driver.evaluateAsync(`(() => {
  ${getElementsInDocumentString};
  ${getLinkElementsInDOM};

  return getLinkElementsInDOM();
})()`, {useIsolation: true});

// 2) Creating code with `createEvalCode` helper.
const code = pageFunctions.createEvalCode(getLinkElementsInDOM, {
  deps: [
    pageFunctions.getElementsInDocumentString,
  ],
});
return passContext.driver.evaluateAsync(code, {useIsolation: true});

trace-elements.js must return a function declaration, so there should also be a mode: 'iffe'|'function' to control the code generation. See the diff to this draft PR for details.

I think this would be useful for many of the more complex usages of evaluateAsync:

afterPass(passContext) {
const expression = `(function() {
const tapTargetsSelector = "${tapTargetsSelector}";
${pageFunctions.getElementsInDocumentString};
${disableFixedAndStickyElementPointerEvents.toString()};
${elementIsVisible.toString()};
${elementHasAncestorTapTarget.toString()};
${elementCenterIsAtZAxisTop.toString()}
${truncate.toString()};
${getClientRects.toString()};
${hasTextNodeSiblingsFormingTextBlock.toString()};
${elementIsInTextBlock.toString()};
${getRectArea.toString()};
${getLargestRect.toString()};
${getRectCenterPoint.toString()};
${rectContains.toString()};
${pageFunctions.getNodePathString};
${pageFunctions.getNodeSelectorString};
${pageFunctions.getNodeLabelString};
${gatherTapTargets.toString()};
return gatherTapTargets();
})()`;
return passContext.driver.evaluateAsync(expression, {useIsolation: true});

const response = await driver.sendCommand('Runtime.callFunctionOn', {
objectId,
functionDeclaration: `function () {
${setAttributeMarker.toString()};
${pageFunctions.getNodePathString};
${pageFunctions.getNodeSelectorString};
${pageFunctions.getNodeLabelString};
${pageFunctions.getOuterHTMLSnippetString};
return setAttributeMarker.call(this, '${metricName}');
}`,
returnByValue: true,
awaitPromise: true,
});
if (response && response.result && response.result.value) {
traceElements.push(response.result.value);
}

/**
* @param {string} src
*/
/* istanbul ignore next */
function injectIframe(src) {
/** @type {HTMLIFrameElement} */
const iframe = document.createElement('iframe');
// Try really hard not to affect the page.
iframe.style.display = 'none';
iframe.style.visibility = 'hidden';
iframe.style.position = 'absolute';
iframe.style.top = '-1000px';
iframe.style.left = '-1000px';
iframe.style.width = '1px';
iframe.style.height = '1px';
iframe.src = src;
iframe.onload = iframe.onerror = () => {
iframe.remove();
delete iframe.onload;
delete iframe.onerror;
};
document.body.appendChild(iframe);
}
await this.driver.evaluateAsync(`${injectIframe}(${JSON.stringify(url)})`, {
useIsolation: true,
});

Simple ones like this should probably just remain as-is.

I think the criteria is: if it requires more than one function or parameter serialization, it'd benefit from pageFunction.createEvalCode.

Idea 2 - types

We can "tag" the return value of createEvalCode, which returns a string, with the return type of mainFn. driver.evaluateAsync can then attempt to pattern match for that type, and mark its return value as the same!

The nice part about this is that it's "tagged" via an intersection type string & {__taggedType: T}, so this value can be used in any way a string can be used, which is mighty convenient since at runtime it's only ever gonna be a string.

Instances where the type would be useful for code readability and correctness (including most of the examples already linked above):

(tag is any)

const scriptSrc = `(${collectTagsThatBlockFirstPaint.toString()}())`;
const firstRequestEndTime = networkRecords.reduce(
(min, record) => Math.min(min, record.endTime),
Infinity
);
return driver.evaluateAsync(scriptSrc).then(tags => {
const requests = TagsBlockingFirstPaint._filteredAndIndexedByUrl(networkRecords);
return tags.reduce((prev, tag) => {
const request = requests[tag.url];
if (request && !request.isLinkPreload) {
// Even if the request was initially blocking or appeared to be blocking once the
// page was loaded, the media attribute could have been changed during load, capping the
// amount of time it was render blocking. See https://github.com/GoogleChrome/lighthouse/issues/2832.
const timesResourceBecameNonBlocking = (tag.mediaChanges || [])
.filter(change => !change.matches)
.map(change => change.msSinceHTMLEnd);
const earliestNonBlockingTime = Math.min(...timesResourceBecameNonBlocking);
const lastTimeResourceWasBlocking = Math.max(
request.startTime,
firstRequestEndTime + earliestNonBlockingTime / 1000
);

(elements is coerced)

const expression = `(function() {
${pageFunctions.getElementsInDocumentString}; // define function on page
${getClientRect.toString()};
${getHTMLImages.toString()};
${getCSSImages.toString()};
${collectImageElementInfo.toString()};
return collectImageElementInfo();
})()`;
/** @type {Array<LH.Artifacts.ImageElement>} */
const elements = await driver.evaluateAsync(expression);

(return value is coerced)

return driver.evaluateAsync(`(() => {
${getElementsInDocumentString};
return getElementsInDocument('head meta').map(meta => {
return {
name: meta.name.toLowerCase(),
content: meta.content,
property: meta.attributes.property ? meta.attributes.property.value : undefined,
httpEquiv: meta.httpEquiv ? meta.httpEquiv.toLowerCase() : undefined,
charset: meta.attributes.charset ? meta.attributes.charset.value : undefined,
};
});
})()`, {useIsolation: true});

(value is coerced)

const expression = `(() => {
${pageFunctions.getOuterHTMLSnippetString};
${pageFunctions.getElementsInDocumentString};
${pageFunctions.isPositionFixedString};
return (${collectIFrameElements})();
})()`;
/** @type {LH.Artifacts['IFrameElements']} */
const iframeElements = await driver.evaluateAsync(expression, {useIsolation: true});
return iframeElements;

(value is coerced)

const expression = `(() => {
${pageFunctions.getOuterHTMLSnippetString};
${pageFunctions.getElementsInDocumentString};
${pageFunctions.getNodePathString};
${pageFunctions.getNodeSelectorString};
${pageFunctions.getNodeLabelString};
return (${collectAnchorElements})();
})()`;
/** @type {Array<LH.Artifacts.AnchorElement>} */
return driver.evaluateAsync(expression, {useIsolation: true});

There's a few more like these, where afterPass directly returns the result of evaluateAsync and the type is coerced to w/e the @return JSDoc says it should be.

The criteria I mentioned earlier might then become: if it requires more than one function, parameter serialization, or returns a value, it'd benefit from pageFunction.createEvalCode.

Idea 3 - scope the dependency functions

Example:

return driver.evaluateAsync(`(() => {
${getElementsInDocumentString};
return getElementsInDocument('head meta').map(meta => {
return {
name: meta.name.toLowerCase(),
content: meta.content,
property: meta.attributes.property ? meta.attributes.property.value : undefined,
httpEquiv: meta.httpEquiv ? meta.httpEquiv.toLowerCase() : undefined,
charset: meta.attributes.charset ? meta.attributes.charset.value : undefined,
};
});
})()`, {useIsolation: true});

All the logic in this gatherer is in a string, so nothing is type checked. Even if moved to a function and stringified like most other usages of evaluateAsync, getElementsInDocument would need to be ignored with eslint and tsignored because it doesn't actually exist until executed in the browser.

page-functions.js could export getElementsInDocument, and if the function referenced it then typecheck/linting would work, but would error at runtime because pageFunctions doesn't exist in the browser.

If instead we injected a pageFunction = {...} object with all the dependencies, we could import the function in Node, use it in our browser function, and maintain all the type checking. It'd look like this:

const Gatherer = require('./gatherer.js');
const pageFunctions = require('../../lib/page-functions.js');

function getMetaElements() {
  return pageFunctions.getElementsInDocument('head meta').map(el => {
    const meta = /** @type {HTMLMetaElement} */ (el);
    return {
      name: meta.name.toLowerCase(),
      content: meta.content,
      property: meta.attributes.property ? meta.attributes.property.value : undefined,
      httpEquiv: meta.httpEquiv ? meta.httpEquiv.toLowerCase() : undefined,
      charset: meta.attributes.charset ? meta.attributes.charset.value : undefined,
    };
  });
}

class MetaElements extends Gatherer {
  /**
   * @param {LH.Gatherer.PassContext} passContext
   * @return {Promise<LH.Artifacts['MetaElements']>}
   */
  async afterPass(passContext) {
    const driver = passContext.driver;

    // We'll use evaluateAsync because the `node.getAttribute` method doesn't actually normalize
    // the values like access from JavaScript does.
    const code = pageFunctions.createEvalCode(getMetaElements, {
      deps: [
        pageFunctions.getElementsInDocument,
      ],
    });
    return driver.evaluateAsync(code, {useIsolation: true});
  }
}

As part of this, I think page-functions.js should always be imported like this:

const pageFunctions = require('../../page-function.js');

and stop exporting the *String variants of each function, just export the function.

At this point, we might even consider just injecting all the page functions in createEvalCode, but that may be too wasteful (this has to cost something, right?)

Idea 4 - debugging

Debugging page functions is tedious. Best way I've found so far is to just throw an error with the data I want to log. It'd be nice if any console.logs executed in the code would be collected and printed when the code returns.

could do this by listening to Console.messageAdded during execution. could require something like console.log('lhdebug', ...) so that it only logs stuff we are debugging, not something else the page is doing.

@connorjclark connorjclark requested a review from a team as a code owner May 14, 2020 06:10
@connorjclark connorjclark requested review from brendankenny and removed request for a team May 14, 2020 06:10
@connorjclark connorjclark marked this pull request as draft May 14, 2020 06:11
@brendankenny
Copy link
Contributor

This looks really good and agreed this will make evaluateAsync a lot more pleasant (and safe!) to use.

A few thoughts, though I've only read the code, not played around with it with the type checker, so maybe you've already worked through these:

  1. Do we need to keep the string intermediate around? It's a quite nice use of a tagged type, but could we skip that step and make a version of evaluateAsync that takes the arguments of createEvalCode directly (so it would be similar in signature to puppeteer's page.evaluate)?

  2. (re: Idea 3) it seems like auto-wrapping the page functions in a namespace object would have the same problem if people locally did const {getElementsInDocument} = require('../../page-function.js'); which is hard to enforce not to do. Also wouldn't a deps function defined in the same gatherer file as mainFn also get nested in pageFunctions and so have the nested/not-nested mismatch as well? This might just be a case where there's always going to be issues and we just need to make debugging easier.

  1. (re: Idea 4) And so improving debugging is a good idea :) I usually copy the constructed expression and then paste it into devtools, but that's not very automated either :) If we listened for Console.messageAdded (Runtime.consoleAPICalled?) we might need to make it an opt-in debug flag since some sites log a lot of things, and I imagine we don't want to be dealing with the traffic (even if immediately discarded) while taking a trace.

@brendankenny
Copy link
Contributor

2. This might just be a case where there's always going to be issues and we just need to make debugging easier.

We'll also probably always be living with some level of typescript issues no matter what and we'll just have to be ok with it. tsc is really not built to check javascript that can run in two different domains defined in the same project (see the 2.5 year old issue just to make it not a gigantic pain to define worker code in the same project as main-thread code: microsoft/TypeScript#20595, and that's not dealing with the node/browser split). That's partly why we started calling a spade a spade and used function strings explicitly in the first place, but that was still early days for type checking in Lighthouse (or predated most of it?) so we can definitely do better now.

debugging

another idea (drifting farther from the proposed PR) is that with multi-client you can now get another devtools to connect to the Lighthouse-controlled page to do some debugging. I just tried putting a debugger statement in anchor-elements so it would pause there, ran lighthouse normally with --port:9222 which is a port I've configured for target discovery, went to about:inspect in my usual Chrome, and was able to debug normally. That's actually a pretty great experience, so we should automate that as much as possible :)

@connorjclark
Copy link
Collaborator Author

Do we need to keep the string intermediate around? It's a quite nice use of a tagged type, but could we skip that step and make a version of evaluateAsync that takes the arguments of createEvalCode directly (so it would be similar in signature to puppeteer's page.evaluate )?

It's possible, but my concern is I think we consider driver.evaluateAsync to be part of our public API, and so custom gatherers would break if we change that. Maybe driver.evaluateAsync could be changed to take a string or an object defining {main, mode, args, deps}, and so we can avoid breaking changes?

(re: Idea 3) it seems like auto-wrapping the page functions in a namespace object would have the same problem if people locally did const {getElementsInDocument} = require('../../page-function.js'); which is hard to enforce not to do.

Another benefit of importing as const pageFunctions = ... is that explicit usages of pageFunction.xyz denotes that this is a function that must be serialized, kinda like how our usage of someFunctionString does.

As far as enforcing, I think we're just left with code reviews. If we change all imports then I think further contributions won't be influenced to use the destructuring format.

Also wouldn't a deps function defined in the same gatherer file as mainFn also get nested in pageFunctions and so have the nested/not-nested mismatch as well? This might just be a case where there's always going to be issues and we just need to make debugging easier.

Ah yeah, that's true...for this to work it'd have to be more like

const pageFunctions = {
  ...require('../page-functions.js'),
  myDep,
}

function myDep() {
  // ...
}

function myMain () {
  pageFunctions.myFunction(1),
}

IMO this is pretty nice.

I think adopting a standard that all injected functions need to be on a namespaced pageFunctions object will help to meld these two different runtimes.

If we listened for Console.messageAdded (Runtime.consoleAPICalled?) we might need to make it an opt-in debug flag since some sites log a lot of things, and I imagine we don't want to be dealing with the traffic (even if immediately discarded) while taking a trace.

Agreed.

We'll also probably always be living with some level of typescript issues no matter what ... That's partly why we started calling a spade a spade and used function strings explicitly in the first place, but that was still early days for type checking in Lighthouse (or predated most of it?) so we can definitely do better now.

I think we'll be able to get 100% TS coverage here if we buy into the pageFunctions dependency namespace.

another idea (drifting farther from the proposed PR) is that with multi-client you can now get another devtools to connect to the Lighthouse-controlled page to do some debugging. I just tried putting a debugger statement in anchor-elements so it would pause there, ran lighthouse normally with --port:9222 which is a port I've configured for target discovery, went to about:inspect in my usual Chrome, and was able to debug normally. That's actually a pretty great experience, so we should automate that as much as possible :)

I'll have to give that a whirl.

@brendankenny
Copy link
Contributor

brendankenny commented May 15, 2020

It's possible, but my concern is I think we consider driver.evaluateAsync to be part of our public API, and so custom gatherers would break if we change that. Maybe driver.evaluateAsync could be changed to take a string or an object defining {main, mode, args, deps}, and so we can avoid breaking changes?

Agreed! That's what I meant about page.evaluate, which takes either a string or a function (+ args).

const pageFunctions = {
  ...require('../page-functions.js'),
  myDep,
}

function myDep() {
  // ...
}

function myMain () {
  pageFunctions.myFunction(1),
}

IMO this is pretty nice.

we'll have to agree to disagree :)

We'll also probably always be living with some level of typescript issues no matter what ... That's partly why we started calling a spade a spade and used function strings explicitly in the first place, but that was still early days for type checking in Lighthouse (or predated most of it?) so we can definitely do better now.

I think we'll be able to get 100% TS coverage here if we buy into the pageFunctions dependency namespace.

Oh, my point was separate from pageFunctions. We have both lib.dom.d.ts and node.d.ts turned on for our project, which can't both be true. We mostly get away with it by convention and eslint-env, but it's not actually checked by tsc and it's all off when a file sets eslint-env browser :) There's not much we can do about that without much bigger changes...tsc really wants them to be two disjoint sets of files type checked in separate invocations, which might be doable but doesn't seem worth it.

I'll have to give that a whirl.

It's surprisingly usable! We'd just need to turn off the protocol timeout.

@connorjclark
Copy link
Collaborator Author

For a first pass, I can put up a PR that just extends evaluateAsync to optionally take a string or a function + the mentioned options. That's what I wanted most out of this anyway :)

@connorjclark
Copy link
Collaborator Author

May refer back to this for further enhancements. #10816 has done the most important bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants