INNA Knowledge Management

INNAKM - Hak Cipta Details

INNA Knowledge Management

This project is a FastAPI-based web service that collects, processes, and stores data from multiple external portals. It uses Motor, the asynchronous MongoDB driver for Python, to handle data efficiently and reliably, transforming inputs from different sources into a unified format ready for downstream applications and services.

The primary goal of this project is to aggregate data from external portals (e.g., WikiData, NCBI, BacDive, GBIF), clean and normalize it, and store the consolidated information in a MongoDB database. This ensures that all data is processed efficiently and ready for use in various downstream applications.

I designed and implemented a FastAPI-based data aggregation service that collects, processes, and stores information from multiple external portals such as WikiData, NCBI, BacDive, and GBIF. I built asynchronous data pipelines using Motor to efficiently handle MongoDB operations without blocking I/O, ensuring reliable performance for high-throughput workloads. My work focused on normalizing and cleaning heterogeneous data sources into a unified schema, making the data ready for downstream applications and services. I also structured the API architecture, handled error management, and maintained the project using GitHub to support version control and collaborative development.

Features
  • Automatically aggregates species data from multiple authoritative portals such as Wikidata, NCBI, BacDive, and GBIF.
  • Retrieves data asynchronously using FastAPI, httpx, and Motor for high-performance, non-blocking operations.
  • Parses and normalizes heterogeneous data formats (JSON, XML, HTML, SPARQL) into a unified structure.
  • Stores both raw and processed data separately for traceability and auditing purposes.
  • Structures species knowledge into semantic sections like taxonomy, morphology, physiology, genome, and occurrences.
  • Provides a full-text search feature using MongoDB text indexes for fast species discovery.
  • Exposes RESTful APIs for managing taxa, portals, raw data, and structured terms.
  • Implements retry and fallback mechanisms to ensure reliable data retrieval from external services.
Architecture
  • FastAPI for the web service framework, enabling high performance and easy-to-use APIs.
  • MongoDB as the database, leveraging its flexibility in handling various types of data.
  • Motor for asynchronous operations with MongoDB, ensuring non-blocking I/O and efficient performance for high-throughput tasks.
  • GitHub for version control and collaboration, supporting efficient code management and team-based development workflows.

Project information