Skip to content

API search across large number of projects results in java.lang.OutOfMemoryError #1806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vladak opened this issue Oct 4, 2017 · 6 comments

Comments

@vladak
Copy link
Member

vladak commented Oct 4, 2017

Performing JSON search across all projects (https://mygrok/source/json?freetext=foo&maxresults=80 - note there is no project parameter) which results in:

java.lang.OutOfMemoryError: GC overhead limit exceeded
Dumping heap to /var/tomcat8/logs/dumps/java_pid21674.hprof ...
Heap dump file created [7654991411 bytes in 57.620 secs]
Exception in thread "http-nio-8888-exec-3" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.StringCoding.decode(StringCoding.java:215)
        at java.lang.String.<init>(String.java:463)
        at java.lang.String.<init>(String.java:515)
        at org.apache.lucene.document.DocumentStoredFieldVisitor.stringField(DocumentStoredFieldVisitor.java:75)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:222)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:595)
        at org.apache.lucene.index.CodecReader.document(CodecReader.java:88)
        at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:118)
        at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:118)
        at org.apache.lucene.index.IndexReader.document(IndexReader.java:370)
        at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:326)
        at org.opensolaris.opengrok.search.SearchEngine.searchMultiDatabase(SearchEngine.java:234)
        at org.opensolaris.opengrok.search.SearchEngine.search(SearchEngine.java:350)
        at org.opensolaris.opengrok.search.SearchEngine.search(SearchEngine.java:293)
        at org.opensolaris.opengrok.web.JSONSearchServlet.doGet(JSONSearchServlet.java:120)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.opensolaris.opengrok.web.StatisticsFilter.doFilter(StatisticsFilter.java:55)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.opensolaris.opengrok.web.AuthorizationFilter.doFilter(AuthorizationFilter.java:83)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:478)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)

most probably due to large number of search hits being returned.

The Tomcat server is running with:

21674:	/usr/jdk/instances/jdk1.8.0/bin/java -Djava.util.logging.config.file=/var/tomca
argv[0]: /usr/jdk/instances/jdk1.8.0/bin/java
argv[1]: -Djava.util.logging.config.file=/var/tomcat8/conf/logging.properties
argv[2]: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
argv[3]: -XX:+HeapDumpOnOutOfMemoryError
argv[4]: -XX:HeapDumpPath=/var/tomcat8/logs/dumps
argv[5]: -d64
argv[6]: -server
argv[7]: -Xmx8g
argv[8]: -Djdk.tls.ephemeralDHKeySize=2048
argv[9]: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
argv[10]: -classpath
argv[11]: /usr/tomcat8/bin/bootstrap.jar:/usr/tomcat8/bin/tomcat-juli.jar
argv[12]: -Dcatalina.base=/var/tomcat8
argv[13]: -Dcatalina.home=/usr/tomcat8
argv[14]: -Djava.io.tmpdir=/var/tomcat8/temp
argv[15]: org.apache.catalina.startup.Bootstrap
argv[16]: start

This does not happen when searching across all projects in the web UI (got some 500k hits).

@vladak vladak added the bug label Oct 5, 2017
@tarzanek
Copy link
Contributor

tarzanek commented Oct 6, 2017

this smells like too many string allocation happening, resp. we do page search in UI, json might not do that ...

@tarzanek
Copy link
Contributor

tarzanek commented Oct 6, 2017

resp. maxresults seems no to get considered ...
lucene 7.0 has some fixes in this regard, but we should fix this in current code too

@vladak
Copy link
Member Author

vladak commented Apr 17, 2018

Both searchSingleDatabase() and searchMultiDatabase() use collector for the results like this:

191          collector = TopScoreDocCollector.create(hitsPerPage * cachePages);
192          searcher.search(query, collector);

where the collector parameters come from Configuration rather than from the request. Still, by default the values of these parameters are 125 and 5 respectively so it is a bit surprising that JVM ran out of space.

@tarzanek
Copy link
Contributor

for new lucene this should be done in a better way (we have it in 2 places after recently Chris also fixed this in Searchengine)
this old code needs a bit of improvement ...

@tarzanek
Copy link
Contributor

@tulinkry
Copy link
Contributor

Just for the record, this has been moved to http://grok/source/api/v1/search?full=foo

@vladak vladak added the API label Dec 3, 2020
@vladak vladak changed the title JSON search across large number of projects results in java.lang.OutOfMemoryError API search across large number of projects results in java.lang.OutOfMemoryError Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants