Shih Peng Wen

prof_pic.jpg

I am a Python Developer with expertise in Data Science, Natural Language Processing and Testing.

Currently, I work as a Data Analysis Engineer at GoMore.


personal projects

  1. NDLTD TW Papers Graph
    • This project uses Vue.js for the frontend and FastAPI for the backend. It utilizes Opensearch as vector database.
    • This project employs GitHub Actions for automating the CI/CD process and hosted on AWS.
    • Uses AWS Lambda as web scraper to gather data, using Sentence-Transformer to understand the meaning of words, and uses KNN to find articles that are similar to each other.
    • Inpired by: Keyword Analysis (GroundAI)
    • https://ndltd-tw-papers-graph.wspooong.com
  2. LY Transcription
    • LY Transcription helps you download Legislative Gazette records from the Legislative Yuan(立法院) and convert them into JSON files.
    • Since the files are in DOC format, and Python currently supports processing Word files in DOCX format, the files need to be converted from DOC to DOCX before the transformation can take place.
    • Note: The DOC-to-DOCX conversion tool is provided by Microsoft Office, and I have only tested it on Windows. Therefore, this project is currently compatible only with Windows systems.

skilled-based project

  1. Facebook Group Scraper