Simple Web Scraping Colly App with Fiber
A Go application using Fiber, Colly v2, and GORM to scrape websites and persist data in PostgreSQL.
Prerequisites
- Docker and Docker Compose
How to Run
- Clone the repository.
- Navigate to the project directory:
cd colly-gorm - Copy the example env file:
cp app/app.env.example app/app.env - Start the stack:
docker compose up --build
Project Structure
colly-gorm/
├── app/
│ ├── app.env.example # Environment variable template
│ ├── Dockerfile
│ ├── go.mod
│ ├── cmd/
│ │ └── api/
│ │ └── main.go # App entry point, Fiber routes
│ └── internals/
│ ├── consts/
│ │ └── consts.go # Config loading via Viper
│ └── services/
│ ├── database/
│ │ ├── database.go # GORM connection
│ │ └── models.go # Quote and Course models
│ └── scrapers/
│ ├── toscrape.go # Quotes scraper
│ └── coursera_courses.go # Coursera scraper
├── db/
│ └── create_db.sql # DB initialization
└── docker-compose.yml
API Endpoints
| Method | Path | Description |
|---|---|---|
GET | /api/healthchecker | Health check — returns service status |
GET | /scrape/quotes | Triggers async scraping of quotes.toscrape.com and stores results in PostgreSQL |
GET | /scrape/coursera | Triggers async scraping of coursera.org/browse and stores course data in PostgreSQL |
Scraping jobs run asynchronously; the endpoint returns immediately while scraping continues in the background.
Database Models
Quote
author— quote authorquote— quote text
Course
title,description,creator,url,rating
Environment Variables
See app/app.env.example:
POSTGRES_HOST=colly_db
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=colly
What It Does
- Registers Colly HTML callbacks before visiting pages (correct callback order).
- Scrapes data from websites and stores it in a PostgreSQL database via GORM.
- Uses Fiber middleware (logger, CORS) applied globally before sub-app routing.