Skip to content

UI and API /search return vastly different result sizes in project-less mode #3170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mkboudreau opened this issue Jul 7, 2020 · 14 comments

Comments

@mkboudreau
Copy link

Describe the bug
It does not seem like the API and the UI are returning the same results. Maybe I'm missing something, but when I search for the exact same thing from the UI and the API I get 125 results from the API and 9,708 from the UI. Something seems not right.

To Reproduce
Try /search?... and /api/v1/search?... with all the same query parameters. In my test only full= was set.

Expected behavior
I expect the UI and the REST API to return the same results

Screenshots

  • UI in Browser

URL: http://local-opengrok/search?full=test&defs=&refs=&path=&type=

image

  • API in terminal

URL: http://local-opengrok/api/v1/search?full=test&defs=&refs=&path=&type=

curl -s 'http://local-opengrok/api/v1/search?full=test&defs=&refs=&path=&type=' | jq .resultCount
125
curl -s 'http://local-opengrok/api/v1/search?full=test&defs=&refs=&path=&type=' | jq 
{
  "time": 167,
  "resultCount": 125,
  "startDocument": 0,
  "endDocument": 124,
  "results": {
...
@vladak
Copy link
Member

vladak commented Jul 20, 2020

When you do the search in the UI, do you have any projects selected ? (assuming you are running with projects enabled)

@mkboudreau
Copy link
Author

When you do the search in the UI, do you have any projects selected ? (assuming you are running with projects enabled)

The screenshot supplied the entire UI, there is nothing else on the screen. I do not see anything relating to "projects".

@vladak
Copy link
Member

vladak commented Jul 29, 2020

How do you run the indexer ?

@mkboudreau
Copy link
Author

mkboudreau commented Jul 29, 2020

The indexer runs as in a container and upon completion, it bounces the web app container. The containers point to the same data and source dirs.

opengrok-indexer \
    -j /usr/bin/java \
    -J=-Djava.util.logging.config.file=/var/opengrok/logging.properties \
    -J=-Xms6g -J=-Xmx6g \
    ${JMX_OPTIONS} ${HEAPDUMP_OPTIONS} \
    -a /opt/opengrok/lib/opengrok.jar \
    -- \
    --verbose \
    --progress \
    --assignTags \
    --source /opengrok/sources \
    --dataRoot /opengrok/data \
    --renamedHistory on \
    --memory 256 \
    -i node_modules -i vendor -i *.dll -i *.so -i *.exe -i *.jar -i *.gz

Why do you think the indexer would influence a difference between the REST api and the web UI returning different results?

@vladak
Copy link
Member

vladak commented Jul 30, 2020

I was asking because of the projects. You're running the indexer with projects disabled and it might be relevant for root causing this issue.

@mkboudreau
Copy link
Author

ok, thank you for the clarification. please let me know if there is anything else you need from me.

@mkboudreau
Copy link
Author

@vladak any luck duplicating this issue?

@vladak
Copy link
Member

vladak commented Aug 7, 2020

Tried with simple project-less setup, could not reproduce it. Is there something special about those 125 search hits ?

@mkboudreau
Copy link
Author

I just reran the queries and I cannot see anything special. It appears to be a subset, but I see a variety of file types, directory structures, etc. They all seem valid, except for there being 125 instead of ~140,000. The indexer indexes a local directory of all our organization's git repos in the format /org/repo1, /org/repo2, and so on.

Other examples I've executed from the REST endpoint (i.e. /api/v1/search?full=somesearch) are also being limited at 125.

@idodeclare
Copy link
Contributor

I was asking because of the projects. You're running the indexer with projects disabled and it might be relevant for root causing this issue.

@vladak , you are correct: when projects are disabled there seems to be an erroneous "double paging" going on. SearchEngine in project-less configuration filters records early even though /api/v1/search also will try to do later. The number of results in the SearchEngine paging is by default numHitsPerPage * cachePages or 25 * 5 = 125.

The projects-enabled search by SearchEngine however also seems undesirably expensive in that it manifests every document found even though /api/v1/search will later filter to a page of (by default) 1000 results.

@mkboudreau
Copy link
Author

@vladak given the investigation @idodeclare has done, this feels like a legitimate issue. Do you agree? What are the next steps?

@mkboudreau
Copy link
Author

@vladak any update on this issue?

@vladak
Copy link
Member

vladak commented Sep 23, 2020

sorry, no bandwidth to work on this currently.

@vladak vladak changed the title UI and API /search return vastly different result sizes UI and API /search return vastly different result sizes in project-less mode Dec 3, 2020
@vladak
Copy link
Member

vladak commented Dec 3, 2020

The projects-enabled search by SearchEngine however also seems undesirably expensive in that it manifests every document found even though /api/v1/search will later filter to a page of (by default) 1000 results.

This could lead to #1806 I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants