Nonprofits and Data

last person joined: yesterday 

This group is for those interested in learning and sharing about all things data-related for nonprofits. The Nonprofits and Data group is for people using data to serve a mission, either directly or by improving nonprofits and the nonprofit sector. That includes everything from collecting data and managing databases to analytics, data visualization and data mining. Here are some examples of topics we discuss: using data to improve organizational effectiveness, measuring impact, using data for storytelling, tools for data management and analysis, figuring out the “right” data to collect, and learning skills to help us use data better.

Warehouses, pipelines, and bots, oh my!

  • 1.  Warehouses, pipelines, and bots, oh my!

    Posted Apr 11, 2019 10:28
    Hi everyone,

    I'm looking into beefing up the usage of the data warehouse we use at our organization, and I'm wondering if anyone has found any good resources on creating/maintaining data pipelines and using bots to automate data extractions.

    My main concern is that many of the databases that we're mandated to use don't have APIs or automated export options. At this year's NTC, I attended a session on using bots to automate the ETL process, but I'm not sure if it's possible to do without more 'connectable' systems. Does anyone have experience with automating extracts from these types of systems? Or resources for how to go about setting up a structure for automating this process?

    Many thanks!

    Charlie

    ------------------------------
    Charles Riebeling
    Assistant Director of Learning and Evaluation
    Carlos Rosario International Public Charter School
    Washington, DC
    ------------------------------
    Tech Accelerate


  • 2.  RE: Warehouses, pipelines, and bots, oh my!

    Posted Apr 12, 2019 08:06
    Hi Charles,

    When the source system does not have a robust API or automated export options, we have done this kind of work using bots that actually login to the system and perform the manual tasks that it takes to trigger and export or in some cases, run a report and scrape it to get the data.   We then transform that .csv file to the appropriate format and upload it to an FTP server where another bot picks it up and imports to the warehouse.

    Where there is a will.

    Regards,

    Molly

     



    ------------------------------
    Molly Kelly
    Vice President, Digital Solutions
    molly@zurigroup.com
    ------------------------------

    Tech Accelerate


  • 3.  RE: Warehouses, pipelines, and bots, oh my!

    Posted Apr 12, 2019 08:58
    Hi Charles,

    There are lots of ways to skin a cat. The main concern I have with investing in automation of systems that don't have APIs or automated export options is whether the goal of automation is obscuring the bad fit you may have with the system. It may be better to upgrade the underlying systems and applications that you're using, and get into a platform and environment that's more suitable to you in lots of ways, including making it easier to move data around automatically.

    ------------------------------
    Isaac Shalev
    http://www.sage70.com
    Stamford CT
    @Sage70
    isaac@sage70.com
    ------------------------------

    Tech Accelerate


  • 4.  RE: Warehouses, pipelines, and bots, oh my!

    Posted Apr 12, 2019 18:47
    Hi Charles,

    Isaac makes an excellent point. If your system cannot interface with a modern data warehouse, then trying to force connectivity might be like putting lipstick on a pig. Alternatively, it might make more sense to consider modernizing the said infrastructure.

    I should also add that cost-wise, a lot of times, what initially appears as a quick, inexpensive fix uncovers some other gaps, and ultimately this turns out to be an expensive, yet half-baked option that becomes more expensive to re-architect.


    ------------------------------
    Medha Nanal
    Strategic Data/Database Consultant for Nonprofits (Fundraising, Operations, Programs)
    www.topcloudconsult.com
    medhananal@topcloudconsult.com
    650.600.9374
    ------------------------------

    Tech Accelerate


  • 5.  RE: Warehouses, pipelines, and bots, oh my!

    Posted Apr 30, 2019 15:02
    Thanks, everyone!

    I agree - in a perfect world, the underlying system wouldn't require this type of solution in order to regularly export data or connect to a warehouse. Unfortunately, we're mandated to use a few systems that just don't support this type of connectivity at this stage.

    Molly, what type of application do you use to create and run the bots? I'm not quite sure where/how to get started.

    Thanks again!

    Charlie

    ------------------------------
    Charles Riebeling
    Assistant Director of Accountability
    Carlos Rosario International Public Charter School
    Washington, DC
    ------------------------------

    Tech Accelerate


  • 6.  RE: Warehouses, pipelines, and bots, oh my!

    Posted Apr 30, 2019 17:54
    Charles, the general term you're looking for is 'scraping', which means collecting data from web pages or other outputs that are intended for human consumption. You can look online for all kinds of screen-scraping applications and tools. However, scraping is an especially fragile approach to grabbing data because ot breaks whenever the the page design changes.

    I think the first question is how often do you need to get data into the warehouse - ie, how fresh does the data in the warehouse need to be, especially for data coming from the disconnected systems? The second question is what is the process for extracting the data, transforming it, and loading (ETL) it into the warehouse? If the answers are 'we don't need especially fresh data, and it's relatively easy to do the ETL', a scheduled manual process may be fine. As those answers start to move in the other direction, you can ease/automate different aspects of it. Screen-scraping is a tool of last resort, though it definitely has its uses.

    ------------------------------
    Isaac Shalev
    http://www.sage70.com
    Stamford CT
    @Sage70
    isaac@sage70.com
    ------------------------------

    Tech Accelerate


  • 7.  RE: Warehouses, pipelines, and bots, oh my!

    Posted May 01, 2019 02:46
    Hi Charles,

    You may have already looked into this, in which case this suggestion won't be helpful, but, you only mentioned APIs and automated data export features. You may be able to access the databases directly without an API or data export features. You would need the credentials (address, password, etc) to access the database in this way, and it may be something you'd have to request from your providers, but worth considering if you haven't already. You would also need a tool that can access the particular database format (MySQL, PostgreSQL, etc).

    For example, we have an application with an API and reporting features built in, including bulk export. However, I'm still asking the provider for credentials to access the database directly. It's your data and you should have access to it. This skips the need to download a csv, and it's more complete than an API, which might be built to include only certain fields.

    Good luck!

    ------------------------------
    Joe Bobman
    (pronouns: he/him/his)
    Technology & Engagement Manager
    Food Forward
    Los Angeles, CA
    ------------------------------

    Tech Accelerate


  • 8.  RE: Warehouses, pipelines, and bots, oh my!

    Posted May 01, 2019 13:18
    Charles, as you mention databases you're "mandated to use" and you appear to be in public-funded education, are you bound by FERPA? If so, I imagine any system/process you use for this needs to be audit-able/compliant? I work in HIPAA-land.

    While some responses mention webpage scraping, it's not 100% clear whether you're talking about data living in a frontend presentation, backend (No)/SQL document store/tables, or some combination. In any case, are you using a web application framework (Rails, Django, Laravel, etc.) elsewhere in your org? Any of these would have an Object Relational Mapper/Database Abstraction Layer for connecting to database backends with secure authentication methods, and helpers for making DOM/HTML scraping more manageable. Either way, you would of course need to know the data model details.

    ------------------------------
    Winston Berger
    Data Systems Manager
    A Better Way, Inc.
    Berkeley, CA
    ------------------------------

    Tech Accelerate