Nonprofits and Data

last person joined: 8 days ago 

This group is for those interested in learning and sharing about all things data-related for nonprofits. The Nonprofits and Data group is for people using data to serve a mission, either directly or by improving nonprofits and the nonprofit sector. That includes everything from collecting data and managing databases to analytics, data visualization and data mining. Here are some examples of topics we discuss: using data to improve organizational effectiveness, measuring impact, using data for storytelling, tools for data management and analysis, figuring out the “right” data to collect, and learning skills to help us use data better.

Tool for large data file manipulation

  • 1.  Tool for large data file manipulation

    Posted Jun 22, 2018 12:43
    Hello All,

    We periodically need to provide a list of our members for legal cases. The criteria for a legal member is someone who has taken 3+ advocacy actions in last 12 months. The way we arrived at this list in the past was to export all actions from our online CRM, import the file to Access and have it count the number of occurrences of a constituent. We've had some exponential growth over the last year to the point that the export file (now 3GB) is now too large for Access.

    Does anyone know of a data tool that can accomplish this?

    Denise Cummings
    Friends of the Earth

    Denise Cummings
    Data Systems Administrator
    Friends of the Earth
    Washington, DC

  • 2.  RE: Tool for large data file manipulation

    Posted Jun 22, 2018 14:58
    Hi Denise,

    MySQL can handle an import of that size, and could even be scripted. Alternatively the histogram feature on may work for the information you're trying to extract.

    Adam London
    Project Donor Love
    San Francisco, CA

  • 3.  RE: Tool for large data file manipulation

    Posted Jul 17, 2018 02:06
    Edited by Shubham Mangal Jul 17, 2018 03:05
    MySQL is a good option to accommodate large files.It is the world's most popular open source database.It also has cross platform support,Performance Schema that collects and aggregates statistics about server execution and query performance for monitoring purposes, A set of SQL Mode options to control runtime behavior, Full-text indexing and searching, embedded database library and MySQL can also be run on cloud computing platforms such as Microsoft Azure, Oracle Cloud Infrastructure.

        Hadoop can also be used to handle large data files but one should have the knowledge of Java for this to work.

    Shubham Mangal
    Non-Profit Web Analyst


  • 4.  RE: Tool for large data file manipulation

    Posted Sep 12, 2018 20:44
    Hi Denise,

    I second the tools mentioned above as ones that can handle files that large. I'm personally a fan of trying to develop my programming skills by working with data using Python and Pandas, and found this as a way I might try it if I had the time to fiddle with it and learn how to do it Working with large CSV files in Python.

    As someone who has spent a lot of time looking for tools, I often find a solution in the end that makes the point moot. It is hard to tell from your question, but is it the case that a single member may have multiple actions on that large list? Is there any way in the CRM to select unique constituents?

    Another thought I have is how much of the exported all actions data from the CRM is the actual name or unique identifier of a member? If the only thing you need is the member, but some of that 3GB is extra data like dates of actions or action types, you could cut down on file size by exporting only a single column.

    Good luck!

    Colin Roberts
    Rainier Scholars
    Seattle, WA

  • 5.  RE: Tool for large data file manipulation

    Posted Sep 13, 2018 14:28
    There are quite a few options for doing this, but the question itself has me wondering.

    1. Can you use the native reporting tool in your CRM to execute this query? 
    2. Rather than exporting all actions, could you export only the last 12 months? Or are you already doing that? More generally, can you refine the export so that it's manageable?
    3. Rather than using Access you can use another DB framework like MySQL.However, I'd be curious to learn about your overall reporting approach to see if there isn't some way to leverage your existing frameworks - or, in the alternative, to use this as an opportunity to build a data warehouse solution.

    Isaac Shalev
    Stamford CT