In this paper, we consider the load forecasting for a new user in the system by observing only few shots (data points) of its energy consumption. We propose to utilize clustering to mitigate the challenges brought by the limited samples. Specifically, we first design a feature extraction clustering method for categorizing the historical data. Then, the load forecast for new users is conducted through a two-phase Long Short Term Memory (LSTM) model, which inherits prior knowledge from the clustering results.