Search API response returns entire contents of matching lines which is problematic for files with very large lines #3090

MadApe · 2020-03-24T22:00:18Z

Hello,

First of all, I'm really enjoying using and learning OpenGrok.

I'm running into trouble with the search API returning the entire contents of matching lines in the response. In cases where the response includes a file with a very large matching line (GB), the response is huge and takes a long time to process. I've seen this problem with both binary files and text files. (Though I try to exclude binary files from being indexed.)

Here's an example. For simplicity sake, consider that a query matches one and only one line from a single file. The response from the search API will return the matching file, the line number, and the full text of the matching line. Now consider that the file is a 1GB JSON file with a single line. The response will include the entire matching line, which is 1GB in size.

These large responses can take a lot of time to retrieve. Add to this more users and sometimes more than one "huge match" and we end up having a significant problem scaling OpenGrok.

Expected Behavior:
Is there any way to limit the size of the matching line that the search API returns to prevent these large API responses? I had expected that the API response would reflect the same "truncated" line that the UI displays.

Steps to Reproduce:

Create a very large single-line text file
Index the file
Call the search API with criteria that will match the file created in step need a dashboard like view for a project #1
Observe that the 'line' returned in the response is the full content of the file

Components and Versions:
OpenGrok: v.1.2.9, v.1.3.3
OS: CentOS 7 - 3.10.0-957.el7.x86_64
Java: 1.8.0_211
Tomcat: Apache Tomcat/8.5.39

Some discussion of this has happened here: https://opengrok.slack.com/archives/C6WH95VLN/p1585075008044600

I appreciate the guidance provided on the Slack channel. Please let me know if there is any additional information you need.

Thanks,
Phil

vladak added API enhancement labels Mar 25, 2020

idodeclare mentioned this issue Mar 28, 2020

Alter handling of huge text files #3097

Open

vladak mentioned this issue Oct 13, 2021

search API def return more than needed #3748

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search API response returns entire contents of matching lines which is problematic for files with very large lines #3090

Search API response returns entire contents of matching lines which is problematic for files with very large lines #3090

MadApe commented Mar 24, 2020

Search API response returns entire contents of matching lines which is problematic for files with very large lines #3090

Search API response returns entire contents of matching lines which is problematic for files with very large lines #3090

Comments

MadApe commented Mar 24, 2020