Skip to main content

Simple Web Scraping Colly App with Fiber

Github StackBlitz

A Go application using Fiber, Colly v2, and GORM to scrape websites and persist data in PostgreSQL.

Prerequisites

How to Run

  1. Clone the repository.
  2. Navigate to the project directory: cd colly-gorm
  3. Copy the example env file: cp app/app.env.example app/app.env
  4. Start the stack: docker compose up --build

Project Structure

colly-gorm/
├── app/
│ ├── app.env.example # Environment variable template
│ ├── Dockerfile
│ ├── go.mod
│ ├── cmd/
│ │ └── api/
│ │ └── main.go # App entry point, Fiber routes
│ └── internals/
│ ├── consts/
│ │ └── consts.go # Config loading via Viper
│ └── services/
│ ├── database/
│ │ ├── database.go # GORM connection
│ │ └── models.go # Quote and Course models
│ └── scrapers/
│ ├── toscrape.go # Quotes scraper
│ └── coursera_courses.go # Coursera scraper
├── db/
│ └── create_db.sql # DB initialization
└── docker-compose.yml

API Endpoints

MethodPathDescription
GET/api/healthcheckerHealth check — returns service status
GET/scrape/quotesTriggers async scraping of quotes.toscrape.com and stores results in PostgreSQL
GET/scrape/courseraTriggers async scraping of coursera.org/browse and stores course data in PostgreSQL

Scraping jobs run asynchronously; the endpoint returns immediately while scraping continues in the background.

Database Models

Quote

  • author — quote author
  • quote — quote text

Course

  • title, description, creator, url, rating

Environment Variables

See app/app.env.example:

POSTGRES_HOST=colly_db
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=colly

What It Does

  • Registers Colly HTML callbacks before visiting pages (correct callback order).
  • Scrapes data from websites and stores it in a PostgreSQL database via GORM.
  • Uses Fiber middleware (logger, CORS) applied globally before sub-app routing.