Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow loading time to view forms w lots of submissions #89

Open
danbjoseph opened this issue Sep 22, 2016 · 8 comments
Open

slow loading time to view forms w lots of submissions #89

danbjoseph opened this issue Sep 22, 2016 · 8 comments

Comments

@danbjoseph
Copy link
Member

danbjoseph commented Sep 22, 2016

when a form has a large number of submissions. it takes a very long time to load, even with a decent internet connection. slow connection makes it almost impossible. 3,694 submissions with the buildings form is an example. maybe it can load only 100 and then have option to load the rest. this is an issue with cloud-hosted instances of omk server.

@hallahan
Copy link
Contributor

Yes, it makes sense it gets slow at that point. Implementing pagination will resolve that.

@hallahan
Copy link
Contributor

Have you tried it with @ahmedOpeyemi JSON Stream work?

@hallahan
Copy link
Contributor

Another possible bottleneck is the file system. Many file systems' performance degrades when a directory has a certain number of files. The solution is simple: break up the submissions into more directories.

I've seen Who's on First, NGINX cache directories, Flickr, and others to have done this.

I'd have to get the data and do some profiling to determine what is contributing to your problem. Most likely pagination will resolve it.

@ahmedOpeyemi
Copy link
Contributor

ahmedOpeyemi commented Sep 23, 2016

So, this is one of the issues #85 aimed to resolve at the server end. We had about 8500+ submissions and this issue here #78 was occurring at the server end.
Calling the endpoint directly from your browser http://[server_url]/omk/odk/submissions/[form_id].json using the fix in the PR would show that it returns appropriate result in good time but flattening it out at the client end is where the bottleneck currently is.

@hallahan
Copy link
Contributor

Flattening? You're saying that it's just too much for one table in a browser to handle?

@ahmedOpeyemi
Copy link
Contributor

@hallahan
Here is where the delay is coming from
Below is the flattening fx i was talking about

function doCSV(json) {
    // 1) find the primary array to iterate over
    // 2) for each item in that array, recursively flatten it into a tabular object
    // 3) turn that tabular object into a CSV row using jquery-csv
    var inArray = arrayFrom(json);

    var outArray = [];
    for (var row in inArray)
        outArray[outArray.length] = parseObject(inArray[row]);


    var csv = $.csv.fromObjects(outArray);
    // excerpt and render first 10 rows
    renderCSV(outArray);
    showCSV(true);

    // show raw data if people really want it
    $(".csv textarea").val(csv);

    // download link to entire CSV as data
    // thanks to http://jsfiddle.net/terryyounghk/KPEGU/
    // and http://stackoverflow.com/questions/14964035/how-to-export-javascript-array-info-to-csv-on-client-side
    var uri = "data:text/csv;charset=utf-8," + encodeURIComponent(csv);
    $("#downloadCsv").attr("href", uri).attr("download", getParam('form') + ".csv");
    $("#downloadJson").attr("href", OMK.jsonUrl()).attr("download", getParam('form') + ".json");
}

@hallahan
Copy link
Contributor

Oh right. We do need to "flatten" the JSON so that it works in table form.

@hallahan
Copy link
Contributor

After spending some time examining this problem with a large dataset of 33,578 submissions, I'm seeing the slowness of the submissions page from two things:

  1. The submssions.json response takes quite some time, as the server has to open 33,578 directories and aggregate the data.json. This is unavoidable if you want all of the submissions in 1 go.
  2. As @ahmedOpeyemi explains, the doCSV function takes some time, because it has to iterate through each submission and flatten them to create the table and the CSV object.

Ideally we would have pagination in the REST api that would give back separate responses for smaller chunks of the JSON from the server. The nice table we see with the search and sort functionality, however, would have to be thrown away and be replaced.

Right now, the CSV download and text area are done in the browser, which is slow. The next step is to create a .csv REST endpoint where this is done on the server instead.

I think with what we have right now, we should just create this .csv endpoint and check to see how many submissions there are. If it is above a certain threshold, we should hide the table view and provide links for the user to download CSV and JSON.

Then, we can later implement pagination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants