Skip to content

Search API response returns entire contents of matching lines which is problematic for files with very large lines #3090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MadApe opened this issue Mar 24, 2020 · 0 comments

Comments

@MadApe
Copy link

MadApe commented Mar 24, 2020

Hello,

First of all, I'm really enjoying using and learning OpenGrok.

I'm running into trouble with the search API returning the entire contents of matching lines in the response. In cases where the response includes a file with a very large matching line (GB), the response is huge and takes a long time to process. I've seen this problem with both binary files and text files. (Though I try to exclude binary files from being indexed.)

Here's an example. For simplicity sake, consider that a query matches one and only one line from a single file. The response from the search API will return the matching file, the line number, and the full text of the matching line. Now consider that the file is a 1GB JSON file with a single line. The response will include the entire matching line, which is 1GB in size.

These large responses can take a lot of time to retrieve. Add to this more users and sometimes more than one "huge match" and we end up having a significant problem scaling OpenGrok.

Expected Behavior:
Is there any way to limit the size of the matching line that the search API returns to prevent these large API responses? I had expected that the API response would reflect the same "truncated" line that the UI displays.

Steps to Reproduce:

  1. Create a very large single-line text file
  2. Index the file
  3. Call the search API with criteria that will match the file created in step need a dashboard like view for a project #1
  4. Observe that the 'line' returned in the response is the full content of the file

Components and Versions:
OpenGrok: v.1.2.9, v.1.3.3
OS: CentOS 7 - 3.10.0-957.el7.x86_64
Java: 1.8.0_211
Tomcat: Apache Tomcat/8.5.39

Some discussion of this has happened here: https://opengrok.slack.com/archives/C6WH95VLN/p1585075008044600

I appreciate the guidance provided on the Slack channel. Please let me know if there is any additional information you need.

Thanks,
Phil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants