Apoc.load.ldap LDAPException: Sizelimit Exceeded (4). How to set page results for it?

I'm trying to use apoc.load.ldap to load data from Microsoft Active Directory into Neo4j. The query works if I provide filter criteria that limits the amount of entries returned from LDAP. But when the LDAP result exceeds the 5,000 maximum the query fails.

This same issue occurs with the LDAP search command. In the case of LDAPSearch, it can be resolved by adding a "page results" (-E pr=1000/noprompt) command line parameter like:

ldapsearch -LLL -H ldap://myldap:389 -b 'cn=Users,dc=my,dc=ds,dc=company,dc=com' -D 'userid' -w 'password' -E pr=1000/noprompt '(&(objectClass=group)(member=*))' cn uid objectClass

How can I configure the equivalent "page results" option for apoc.load.ldap to resolve the Sizelimit Exceeded error?

A fragment from the log file showing the error:

LDAPException: Sizelimit Exceeded (4) Sizelimit Exceeded
LDAPException: Matched DN:
        at com.novell.ldap.LDAPResponse.getResultException(LDAPResponse.java:407)
        at com.novell.ldap.LDAPResponse.chkResultCode(LDAPResponse.java:370)
        at com.novell.ldap.LDAPSearchResults.next(LDAPSearchResults.java:289)
        at apoc.load.LoadLdap$SearchResultsIterator.get(LoadLdap.java:213)
        at apoc.load.LoadLdap$SearchResultsIterator.next(LoadLdap.java:204)
        at apoc.load.LoadLdap$SearchResultsIterator.next(LoadLdap.java:186)

Example query:

    call apoc.load.ldap("msad",
    {searchBase : "dc=my,dc=ds,dc=company,dc=com",searchScope : "SCOPE_SUB"
    ,attributes : ["member","cn","uid","objectClass"]
    ,searchFilter: "(&(objectClass=*)(member=*)(cn=z*))"}) yield entry
    merge (g:SecurityADGroup {name : entry.cn})
    foreach (member in entry.member |
      merge (p:AccountUser { uid : split(substring(member,3),',')[0] })
      merge (p)-[:IS_MEMBER]->(g)

Version: 3.5.4

I know AD has some limits in number of results.
I don 't how to fix this with the apoc tool.

But you could try a Python3 script i wrote:

I rembered that the Python LDAP module in the script, retrieves the data from Active Directory in "batches".

Yours Kindly

@omerule Thank you for sharing this alternative! I am looking at alternative ways to try to accomplish this. Depending on how those approaches work out, I may be looking at yours also. But it would be nice to know if there were a configuration setting that could be set for the apoc.load.ldap procedure that provides the type of functionality that the "-E pr=1000/noprompt" option provides on the ldapsearch command to allow larger result sets to be used.

Here is a note from the LDAP3 Python module about the limitation with ActiveDirectory https://ldap3.readthedocs.io/searches.html#search-constraints